Model Evaluation, Model Selection, and Algorithm Selection in Machine Learning

#rasbt さん

https://arxiv.org/pdf/1811.12808.pdf

https://arxiv.org/abs/1811.12808

https://ar5iv.labs.arxiv.org/html/1811.12808

4つのブログ記事を論文化したらしい

ref: https://twitter.com/marktenenholtz/status/1489584583242702850

一読したいと思った

補足コード https://github.com/rasbt/model-eval-article-supplementary

3つのタスクについて論じている

model evaluation

model selection

algorithm selection

結論はこの図（Figure 23）

https://sebastianraschka.com/images/blog/2018/model-evaluation-selection-part4/model-eval-conclusions.jpg

4つのパートからなる

1 Introduction: Essential Model Evaluation Terms and Techniques

Common methods such as the holdout method for model evaluation and selection are covered, which are not recommended when working with small datasets. (Abstract)

「ホールドアウト法は小さいデータセットに取り組むときは推奨されない」

1で論じられる話題

2 Bootstrapping and Uncertainties

Different flavors of the bootstrap technique are introduced for estimating the uncertainty of performance estimates, as an alternative to confidence intervals via normal approximation if bootstrapping is computationally feasible. (Abstract)

「汎化性能の見積りの不確実性を見積もるために、異なる種類のbootstrap技術が紹介される」

「ブートストラップがコンピュータを利用して実現可能なとき、normal approximationによる信頼区間の代替となる」

3 Cross-validation and Hyperparameter Optimization

Common cross-validation techniques such as leave-one-out cross-validation and k-fold cross-validation are reviewed, the bias-variance trade-off for choosing k is discussed, and practical tips for the optimal choice of k are given based on empirical evidence. (Abstract)

「leave-one-outやk fold交差検証などのよく見る交差検証の技術がレビューされる」

「kを選ぶことによるbias-varianceトレードオフが議論される」

「実験に基づいた証拠に基づく、最適なkを選ぶための実務上の秘訣も与えられる」

model evaluation と model selection の話題

4 Algorithm Comparison

Different statistical tests for algorithm comparisons are presented, and strategies for dealing with multiple comparisons such as omnibus tests and multiple-comparison corrections are discussed. Finally, alternative methods for algorithm selection, such as the combined F-test 5x2 cross-validation and nested cross-validation, are recommended for comparing machine learning algorithms when datasets are small. (Abstract)

「アルゴリズム比較のための異なる統計的検定が示され、オムニバス検定や複数比較補正といった複数の比較に対処するための戦略が議論される」

「最後に、combined F-test 5x2 cross-validationやnested交差検証といった、データセットが小さいときにアルゴリズム選択のための代替となる方法がオススメされる」

algorithm selection の話題